AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
DPO Reinforcement Learning

# DPO Reinforcement Learning

Self Biorag 7b Olaph
A fine-tuned version based on Minbyul/selfbiorag-7b-wo-kqa_golden-iter-dpo-step3-filtered, trained using the HuggingFace MedLFQA dataset (excluding kqa_golden) with Direct Preference Optimization (DPO)
Large Language Model Transformers English
S
dmis-lab
20
3
Noromaid 7B 0.4 DPO
A 7B-parameter large language model co-created by IkariDev and Undi, optimized with DPO training
Large Language Model Transformers
N
NeverSleep
137
27
Dpopenhermes 7B V2
Apache-2.0
DPOpenHermes 7B v2 is the second RL fine-tuned model based on OpenHermes-2.5-Mistral-7B, utilizing Direct Preference Optimization (DPO) for reinforcement learning with the Intel/orca_dpo_pairs and allenai/ultrafeedback_binarized_cleaned preference datasets.
Large Language Model Transformers English
D
openaccess-ai-collective
30
31
14B DPO Alpha
CausalLM/14B-DPO-α is a large-scale causal language model supporting Chinese and English text generation tasks, with outstanding performance in MT-Bench evaluations.
Large Language Model Transformers Supports Multiple Languages
1
CausalLM
172
118
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase